Statistical methods
- use a model (e.g., Gaussian) to fit the distribution of all data
- use two models to fit the distributions of non-outliers and outliers separately
- Grubbs’ test
Distance based methods
- the density within a neighborhood
- the distance from a nearest neighbor
Learning based method
- clustering, the smallest cluster is likely to contain outliers
- one-class classifier (e.g., one-class SVM)
- binary classifier (e.g., naive bayes for spam filtering, weighted binary SVM)